Performance evaluation of metaheuristics-tuned recurrent neural networks for electroencephalography anomaly detection

Electroencephalography (EEG) serves as a diagnostic technique for measuring brain waves and brain activity. Despite its precision in capturing brain electrical activity, certain factors like environmental influences during the test can affect the objectivity and accuracy of EEG interpretations. Challenges associated with interpretation, even with advanced techniques to minimize artifact influences, can significantly impact the accurate interpretation of EEG findings. To address this issue, artificial intelligence (AI) has been utilized in this study to analyze anomalies in EEG signals for epilepsy detection. Recurrent neural networks (RNNs) are AI techniques specifically designed to handle sequential data, making them well-suited for precise time-series tasks. While AI methods, including RNNs and artificial neural networks (ANNs), hold great promise, their effectiveness heavily relies on the initial values assigned to hyperparameters, which are crucial for their performance for concrete assignment. To tune RNN performance, the selection of hyperparameters is approached as a typical optimization problem, and metaheuristic algorithms are employed to further enhance the process. The modified hybrid sine cosine algorithm has been developed and used to further improve hyperparameter optimization. To facilitate testing, publicly available real-world EEG data is utilized. A dataset is constructed using captured data from healthy and archived data from patients confirmed to be affected by epilepsy, as well as data captured during an active seizure. Two experiments have been conducted using generated dataset. In the first experiment, models were tasked with the detection of anomalous EEG activity. The second experiment required models to segment normal, anomalous activity as well as detect occurrences of seizures from EEG data. Considering the modest sample size (one second of data, 158 data points) used for classification models demonstrated decent outcomes. Obtained outcomes are compared with those generated by other cutting-edge metaheuristics and rigid statistical validation, as well as results’ interpretation is performed.


Introduction
Electroencephalography (EEG) is a diagnostic method that determines and measures brain waves and brain activity.As a non-invasive, painless and relatively cheap method, it has a wide diagnostic application.The most common indication for EEG is diagnosing or monitoring different types of epilepsy, but it can also be used in diagnosing numerous other neurological disorders such as vascular diseases, tumor processes, infectious diseases, degenerative brain diseases (dementia, Parkinson's disease, ALS), sleep disorders, narcolepsy, etc.The method has been successfully applied as a scientific tool for almost 100 years (Berger, 1929).There are two primary methods for collecting EEG data.Intracranial EEG Jobst et al. (2020) involves a surgical procedure that places electrodes on to the surface of the brain.However, a more popular method for catapulting EEG signals is the use of non invasive scalp EEG.As the latte in non-invasive it is preferred.
Although EEG is a very detailed and precise method of measuring the electrical activity of the brain, certain factors (such as environmental influences that act during the test itself) can affect a completely objective and realistic picture and interpretation of EEG records.In general, the main problem is represented by artifactstechnical and biological.The main sources of technical artifacts are primarily external audio and visual stimuli from the environment -room temperature, incoming electric and electromagnetic noises from transmission lines, electric lights or other electromagnetic fields.Poor contact and position of the electrodes (the electric field decreases with the square of the distance from the source, and thus the signal strength) leads to high impedance and thus additionally encourages the electromagnetic influence of artifacts.Inadequate material from which the electrodes are made, wrongly adjusted filters, quantization amplification noises during analogdigital conversion can further problematize adequate EEG analysis.The main sources of biological artifacts are uncontrolled muscle movements (e.g., neck, face), blinking or eye movements of the subject.The effect of sweating (physical discomfort during shooting) can also be problematic.Additional complications can occur when collecting data from patients affected by epilepsy, as it can be difficult to discern neurological activity form involuntary musicale spasms cased by seizures.Capturing data during a seizure episode can also prove difficult as the occurrence can be spontaneous and sporadic.Detecting anomalous neurological activity using an EEG is an efficient and non invasive way for epilepsy diagnosis.Finally, inadequate interpretation by the doctor who interprets the recording despite all the technical achievements that minimize the influence of occurring artifacts can be crucial in the misinterpretation of EEG findings (Müller-Putz, 2020).
As a non-invasive method, EEG can have advantages over other imaging methods, especially when patients have absolute or relative contraindications for contrast (NMR or CT) imaging -allergic reactions, advanced chronic renal insufficiency, uncontrolled diabetes mellitus with the risk of lactic acidosis, etc.The fact that it is safe and significantly cheaper method (does not require the use of contrast) additionally recommends it in the early diagnosis of these diseases, which is of inestimable importance for timely therapy in these most serious diseases.Early diagnosis of the disease is a crucial factor that enables a timely treatment, which is of crucial importance for improving the therapeutic outcome, especially in the population of patients with the most severe progressive neurological diseases (Armstrong and Okun, 2020;Symonds et al., 2021;Goutman et al., 2022;Hakeem et al., 2022).
Preceding works have explored the application of AI for medical diagnosis (Jovanovic et al., 2023a).However, few works have explored the potential of time series classification for anomaly detection in neurodiagnostics.The potential of networks capable of accounting for temporal variables such as recurrent neural networks (RNNs) has yet to be fully explored when applied to EEG.As EEG data is sequential, and the RNN has been specially developed to deal with this class of problem there is notable application potential.This work therefore proposes a methodology based on RNNs for anomaly detection in EEG readings.A dataset is composed of a publicly available 1 real-world patient dataset (Andrzejak et al., 2001).The testing dataset consists of segments of normal EEG measurements, anomalous EEG measurements of patients suffering from epilepsy, as well as EEG activity during an active seizure.
Two experiments have been conducted.The first experiment involved detecting anomalous activity and was formulated as a binary classification, as two classes exist -normal and anomalous.The second experiment tackled the problem of determining the type of anomalous activity and was formulated as a multiclass classification problem, as the outcome can be classified as normal, anomalous and seizure.Detailed description is provided in Section 4.1.
To improve the performance of the constructed models, several cutting-edge metaheuristic algorithms have been applied to the challenge of optimizing hyperparameters of RNN as well as selecting the optimal network architecture suited to the task.A modified version of the well-known sine cosine algorithm (SCA) (Mirjalili, 2016) algorithm is introduced specifically for this study.Due to the ability of metahersutic algorithms to tackle even NP-hard problems, metaheuristics are a popular choice for tackling the large search space associated with the selection of RNN hyperparamaters.The test outcomes of the simulations carried in this research have been validated through rigorous statistical testing, and the bestperforming models are subjected to interpretation using explainable AI techniques to determine the features that contribute to model decisions.
The primary contributions of this work can be summarized as the following.
• a construction of a combined EEG dataset that can be used for the evaluation of seizure and anomaly detection; • improvements to the classification methodologies available for handling EEG signals using time-series classification based on RNN; • a proposal for a modified metaheuristic for tuning RNN for classifying RNN signals; • the interpretation of the best-performing models in order to determine feature importance when considering anomaly detection; • the explanation of research on this topic and fill the research gap concerning the use of RNN for EEG signal anomaly detection; The rest of this manuscript has been structured in the following manner.Section 2 yields the background and literature survey on RNNs, metaheuristics optimization, and the general overview of the applications of AI algorithms in medicine.Section 3 presents the basic SCA algorithm, followed by the proposed alterations of the baseline algorithm.The simulation setup that was used for the experiments is given in Section 4, while the simulation outcomes are shown in Section 5, accompanied by the statistical analysis of the results and top-performing model interpretation.Section 6 summarizes the research, gives suggestions for possible future work, and concludes this manuscript.

Related works and background
Despite advances in imaging, EEG remains the basic test for the diagnosis of epilepsy.Not only can it confirm the diagnosis (it can also clarify the type of epilepsy), but it can have a role in making therapeutic decisions (e.g., whether to stop treatment in patients without seizures) as well as prognostic significance (e.g., evaluating critically ill patients for possible epileptic status or development of encephalopathy) (Trinka and Leitinger, 2022).Apart from mentioned diseases, this method is also widely used in the early diagnosis of dementia (Al-Qazzaz et al., 2014), Mb Alzheimer's (Stam et al., 2023), brain tumors (Ajinkya et al., 2021), sleep disorders (Kaskie andFerrarelli, 2019;Steiger and Pawlowski, 2019), as well as the most severe neurodegenerative diseases (Kidokoro et al., 2020).Artificial intelligence methods show immense potential in detection of different medical conditions.
The improvement of new clinical systems, patient information and records, and the treatment of various ailments are all areas where AI technologies, from machine learning to deep learning, play a critical role.The diagnosis of various diseases can also be made most effectively using AI approaches.
This section first introduces the recurrent neural networks and their most important applications in different domains.Afterwards, a brief survey of the metaheuristics optimization is provided.
Finally, the overview of general AI applications in medicine is given.

Recurrent neural networks
A recurrent neural network (RNN) (Jain and Medsker, 1999) is a modified version of a traditional neural network designed to handle sequential data.While it maintains many of the components found in neural networks, such as neurons and connections, an RNN has the additional capability of performing a specific operation repeatedly for sequential inputs through recurrent connections.This allows the RNN to store and utilize information from previously processed values in conjunction with future inputs.When provided with an input sequence I = i 1 , i 2 , i 3 , …, i T , the network performs the operation described in Eq. 1 at each step t.
where ôt represent the output and h t denote hidden state at time t.A neural network ϕ W is characterized by a weighted network W.
Initially developed to enable artificial neural networks (ANN) (Krogh, 2008) to handle sequences of data, recurrent neural networks utilize recurrent connections to incorporate the influence of previous outputs on future predictions.This unique characteristic makes RNN particularly well-suited for accurate time-series forecasting using simpler neural network architectures.However, when dealing with long data sequences, certain limitations persist, where simple RNN architectures struggle to provide accurate results.To address this challenge, the attention mechanism (Olah and Carter, 2016) offers a promising solution.
RNNs have found numerous applications across various domains due to their ability to handle sequential data and capture temporal dependencies.RNNs are extensively used in NLP tasks, such as machine translation, language modeling, sentiment analysis, speech recognition, and text generation (Jelodar et al., 2020;Sorin et al., 2020;Zhou et al., 2020;Nasir et al., 2021).Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are popular RNN variants commonly used in NLP tasks (Shewalkar et al., 2019;Sherstinsky, 2020;Yang et al., 2020).They are also well-suited for time series forecasting tasks, such as stock price prediction, weather forecasting, and demand prediction in sales or finance domains, as they are capable of capturing patterns and trends in sequential data (Alassafi et al., 2022;Amalou et al., 2022;Bhoj and Bhadoria, 2022;Freeborough and van Zyl, 2022;Hou et al., 2022;Siłka et al., 2022).RNNs are also frequently employed in speech recognition systems, speech synthesis (textto-speech), and speaker identification.RNNs can process audio data as a sequence of frames, making them suitable for such tasks (Shewalkar et al., 2019;Zhang et al., 2020;Oruh et al., 2022).Finally, RNNs can be applied to video data for tasks like action recognition, video captioning, and video summarization, where temporal information is crucial for understanding the content (Liu et al., 2019;Yuan et al., 2019;Zhao et al., 2019).
RNNs have also been successful in medical domain.Some of the successful applications include medical pre-diagnostics online support (Zhou et al., 2020), cyber-attack and intrusion detection within the medical Internet of Things devices (Saheed and Arowolo, 2021), MRI and CT images processing tasks (Rajeev et al., 2019;Sabbavarapu et al., 2021;Islam et al., 2022), and medical time series (Li and Xu, 2019;Tan et al., 2020), to name the few.

Metaheuristic optimization
While AI methods, like RNNs and ANN show immense potential, their effectiveness, heavily relies on the initial values assigned to a set of parameters known as hyperparameters.Modern methods offer a wide range of control parameters that enable networks to achieve good overall performance while allowing fine-tuning of internal operations to better suit specific problems.However, manually selecting appropriate values for these hyperparameters can be challenging, as modern methods often involve several dozen parameters.Thus, the use of automated methods becomes crucial to facilitate the selection process.Given the broad range of possible parameter values, this task quickly becomes NP-hard, making it seemingly impossible to solve using traditional methods.Consequently, it is imperative to discover and adapt novel approaches to address this challenge.
Metaheuristic algorithms present a feasible solution to this problem.Rather than employing a deterministic approach, they utilize a search strategy.These algorithms do not guarantee finding the optimal solution in a single run but increase the statistical probability of locating the true optimum with each iteration.By adopting this approach, the feasibility of solving NP-hard problems within a reasonable time frame and with manageable computational resources is enhanced.This characteristic makes metaheuristic optimization algorithms a popular choice for hyperparameter tuning.By defining the selection of hyperparameters as a typical maximization problem, optimal values can be determined, thereby improving algorithm performance by further adjusting behaviors to suit the specific task at hand.Researchers have developed numerous metaheuristic algorithms to tackle diverse problem domains, drawing inspiration from various sources.
Stochastic algorithms, known as metaheuristics, are extensively employed in computer science to address NP-hard problems, as deterministic methods are impractical in such cases.These metaheuristic algorithms can be classified into different categories based on the natural phenomena they emulate to guide the search process.Examples include evolution and ant behavior for nature-inspired methods, physical phenomena like storms and gravitational waves, human behavior such as teaching and learning or brainstorming, and mathematical laws like oscillations of trigonometric functions.
Swarm intelligence algorithms are rooted in the collective behavior of large groups consisting of relatively simple units, such as bird flocks or insect swarms.These groups exhibit remarkably synchronized and sophisticated behavioral patterns during essential survival activities such as hunting, scavenging, breeding, and predator avoidance.Swarm intelligence methods, including ant colony optimization (ACO) (Dorigo et al., 2006), particle swarm optimization (PSO) (Wang et al., 2018), artificial bee colony (ABC) (Karaboga and Basturk, 2007), bat algorithm (BA) (Yang and Gandomi, 2012), and firefly algorithm (FA) (Yang and Slowik, 2020), have proven effective in solving a wide range of NP-hard problems in real-life scenarios.In recent years, a particularly efficient family of metaheuristics has emerged that relies on mathematical functions and their properties to facilitate the search process.Prominent examples within this family include the sine-cosine algorithm (Mirjalili, 2016) and the arithmetic optimization algorithm (AOA) (Abualigah et al., 2021).
The diversity of population-based algorithms stems from the nofree-lunch theorem (NFL) (Wolpert and Macready, 1997), which states that there is no universal approach capable of finding the best solution for all optimization challenges.Therefore, the selection of an appropriate metaheuristic method becomes crucial, as a technique that performs well for one problem may not yield the same level of success for another.Hence, the availability of various metaheuristic methods and the need to adapt the algorithm to the specific optimization task at hand.

Brief overview of AI applications in medicine
To identify diseases that require early diagnosis, such as those related to skin, heart, and Alzheimer's, researchers have utilized a variety of AI-based techniques, including machine and deep learning models.In order to reach the best level of accuracy, a backpropagation neural network was utilized in a paper by (Dabowsa et al., 2017) to diagnose skin diseases.Authors in (Uysal and Ozturk, 2020) choose to examine T1-weighted magnetic resonance images in order to analyze dementia in Alzheimer's using Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Random Forest, and Gaussian Naive Bayes methods.Artificial intelligence can be successfully applied in monitoring and detecting medical conditions like brain tumors (Bacanin et al., 2023;Kushwaha and Maidamwar, 2022), diabetes (Joshi and Borse, 2016) or COVID-19 (Zivkovic et al., 2022b).In Salehi et al. (2023)   In all tables that contain simulation results, the best score in every category is marked bold.
model employing SVM and KNN was suggested in (Dorai and Ponnambalam, 2010) to categorize EEG epochs into seizure and nonseizure types.In the other paper, to identify epileptic seizure, authors employed genetic algorithms, SVM, and particle swarm optimization (Hassan and Subasi, 2016).The SVM algorithm was successfully used in another research (Lahmiri and Shmuel, 2018)

Materials and methods
This section first describes the baseline variant of the SCA metaheuristics, and highlights the known drawbacks of the algorithm.Afterwards, the suggested improvements are presented and enhanced version of the algorithm is proposed.

Original algorithm -SCA algorithm
The sine cosine algorithm (SCA) method is a distinct optimization metaheuristic that draws its inspiration from the mathematical properties of trigonometric functions (Mirjalili, 2016).By utilizing sine and cosine functions, the SCA method updates the positions of solutions within the population, leading to oscillations that explore the direct range of the optimal solution.These functions ensure that the solutions undergo variation as their output values are confined to the range of −1 to 1.During the initialization stage, a random number of potential solutions are generated within the search region bounds.Stochastic configurable control variables are employed throughout the algorithm's execution to guide both exploration and exploitation activities.To update the positions of individual solutions during both exploration and exploitation, the algorithm utilizes both the sine and cosine functions.The positional formulas for the sine and cosine functions are represented by Eq. 2-3 respectively.
in which X t i and X t+1 i represent the position of the current individual in the ith dimension at the tth and t + 1-th iteration cycles respectively, the parameters r 1 , r 2 , and r 3 represent pseudostochastically generated control parameters.Additionally, P*i denotes the position of the destination point (i.e., the final best estimation of the optimal value) in the ith dimension.
By interchanging between the equations mentioned above, as depicted in Eq. 4, the control variable r4, which is randomly generated within the range of 0-1, is employed to determine whether the sine or cosine function is utilized during the search process.New values of the pseudo-stochastic control parameters are generated for each segment of an individual within the population.
The main SCA parameters r 1 , r 2 , r 3 and r 4 play a crucial role in influencing the behavior of the algorithm under specific circumstances.Parameter r1 governs the movement of the subsequent solution, determining whether it moves away from or towards a designated destination.To enhance the level of randomization and promote exploration, the control parameter r2 is set within the range of 0 to 2π.The inclusion of parameter r3 determines a level of randomness to the movements, emphasizing movement when r3 > 1 and reducing it when r3 < 1.Additionally, the parameter r4 plays a crucial role in determining the selection between the sine or cosine function for a specific iteration.To achieve a better stability between exploration and exploitation, adaptive adjustments to the function ranges are made according to Eq. 5.
where variable t denoting the current iteration, T representing the maximum number of iterations per run, and a representing a fixed number.

Modified SCA algorithm
The SCA meta-heuristic demonstrates remarkable performance on bound-constrained and unconstrained benchmarks while maintaining simplicity and a limited set of control parameters (Mirjalili, 2016).However, its performance on standard Congress on Evolutionary Computation (CEC) benchmarks reveals a tendency to converge too rapidly towards the current best solutions, resulting in reduced population diversity.This rapid convergence, coupled with its directed search towards the P*, leads to unfavorable outcomes if the initial results are distant from the optimal solution.As a consequence, the algorithm yields unsatisfactory final results as it converges towards a disadvantageous region within the search space.
To tackle the limitations of the original algorithm, this paper presents a modified version of SCA that incorporates two additional procedures to the baseline SCA metaheuristics.These enhancements have been introduced to address the known shortcomings and further improve the performance of the algorithm.1.The initial population is formed by employing a chaotic initialization of solutions, and 2. A self-adaptive search procedure that alternates the search process between the elementary SCA search and the firefly algorithm (FA) search procedures.
The initial modification suggested for the basic version of SCA involves a chaotic initialization of the initial population.This technique is intended to generate an initial collection of individuals close to the optimal region within the search space.The idea of incorporating chaotic maps into metaheuristic algorithms to enhance the search phase was proposed by (Caponetto et al., 2003).Several other notable studies, such as (Kose, 2018;Wang and Chen, 2020;Liu et al., 2021), have demonstrated that using chaotic sequences for the search procedure yields higher efficiency compared to traditional pseudo-random generators.
Among the various chaotic maps available, empirical simulations conducted with SCA metaheuristics have indicated that the logistic map produces the most promising outcomes.As a result, the modified SCA employs the chaotic sequence β, initialized with the pseudo-random value β 0 , generated using the logistic mapping according to Eq. 6, at the beginning of its execution.
where N and μ denote the count of individuals in the population and chaotic control parameter, respectively.The parameter μ is initialized to value 4, while respecting the provided set of limits of β 0 : 0 < β 0 < 1 and β 0 ≠ 0.25, 0.5, 0.75, 1.
Step 1: Generate population Pop of N/2 solutions by employing the conventional initialization method: Step 4: Establish the current best solution P.
Algorithm 1. Pseudo-code that describes the chaotic-based initialization mechanism.
The individual i is a subject of mapping with respect to the generated chaotic sequences applied to every component i as defined by the following equation: where X c i corresponds to the new position of solution i after chaotic perturbations.
The complete process of generating the initial population using chaos-based initialization is presented in Algorithm 1.It is essential to highlight that this introduced initialization mechanism does not impact the algorithm's complexity concerning fitness function evaluations (FFEs), as it generates only N/2 random solutions initially and then executes mapping of those individuals to the corresponding chaotic-based solutions.
The second modification to the baseline SCA is the self-adaptive search method, which governs the switching between the basic SCA search procedure and the FA's search procedure (Yang, 2009), as represented by Eq. 8.
where α is a randomization value, and κ denotes the arbitrary number taken from the Gaussian distribution.Distance between a pair of fireflies i and j is denoted by r i,j .Additional improvement of the FA's search capability is achieved by applying the dynamic α, as described by (Yang and Slowik, 2020).The modified SCA method alternates between the SCA and FA search mechanisms for each component j belonging to every solution i in the following manner: If the generated pseudo-random number within the limits [0,1] is less than the search mode (sm), then the jth component of solution i will update by using the FA search (Eq.8).Otherwise, the plain SCA search will be employed (Eq.4).The search mode sm control variable determines the balancing among the SCA and FA search procedures, with a higher emphasis on the FA search to update solutions in the early rounds.In the later phases when the search space is explored more extensively, the SCA search will be triggered more frequently.This behavior is enabled by dynamic reduction of the value sm during each round t as follows: The initial sm value was determined empirically, and assigned to 0.8 throughout the experiments in this research.
Modified SCA algorithm in fact represents a low-level hybrid, as it integrates FA search procedure to the SCA algorithm.Novel method was given the name hybrid adaptive SCA (HASCA).The pseudo-code presenting the internal implementation of the suggested algorithm is provided by Algorithm 2.
In conclusion, it is important to highlight that the HASCA does not introduce any additional overhead to the baseline SCA method.The complexity in terms of fitness function evaluations (FFEs) for both the basic and enhanced methods is O(N) = N ⋅ N ⋅ T, where N is the population size and T is the number of iterations.

Applied method
The introduced algorithm and the introduced modifications aimed at improving performance are incorporated into a testing framework and compared to several state-of-the-art algorithms as well as the original base algorithms to determine the improvements made.The algorithms are provided with search space constraints for RNN hyperparameters and allocated a population and a certain number of iterations to improve performance.Specific values are presented in the experimental section.
The framework is provided with a dataset of real-world EEG data, further described in the experiential section.A segment of the data is used for training and another for testing.Once hyperparameter optimization is performed, algorithms are evaluated based on objective function as well as other classification metrics described in the experiential setup.
Finally, the attained results for all the simulations are subjected to rigorous statistical analysis to determine the statistical significance of the introduced improvements.A flowchart of the described process is presented in Figure 1.

Experimental setup
This section described the experimental setup of this work.The utilized dataset and preparation procedure are described.This is then followed by the metrics used for the evaluation of each approach.Finally, the experimental setup is provided in details.All simulations have been carried out using the Python programming language and appropriate supporting libraries, TensorFlow, Pandas, Seaborn, and Python SHAP libraries.Experimentation has been carried out on a machine srunning Windows 10.With an Intel i7 CPU, 32 Gb of available RAM memory and a Nvidia 3060 GPU.

Dataset description and prepossessing
When tackling medical data several challenges arise.One major issue is the availability of the dataset.Oftentimes, quality data is not publicly available, limiting research capacity for outside researchers to build upon established techniques.This work therefore uses a publicly available2 labeled dataset Andrzejak et al. (2001).However, while this dataset is properly labeled and well formatted, some reprocessing is needed to make it suited for this research.The original dataset contains EEG data concerning five patients.Two of these are known to be neurotypical individuals labeled A and B in the original data.Two patients, confirmed to be suffering from epilepsy, are labeled C and D in the original data.Finally, the fifth EEG data segment of the dataset was captured during an active epileptic seizure labeled E in the original data.Each sample consists of 26 s of recorded data, totaling 4,096 data points from 100 electrodes.
These data segments have been recombined into two separate subsets used for the two experiments conducted in this research.The first dataset combined neurotypical individuals' EEG (sample A) readings with that of a person suffering from epilepsy (sample C).Each of the samples was segmented into half-second intervals (158 data points) and randomly recombined to formulate a single continuous EEG reading.
A similar procedure was repeated for the dataset used in the second dataset.However, to better explore the potential of RNNs this experiment was formulated as a multi-class classification.A combination of the available data was created using data from a neurotypical individual (sample A) a patient confirmed to be suffering from epilepsy (patient D) as well as data captured during an active seizure (sample E).The data was once again recombined The readings form the first ten electrodes in the constructed datasets can be seen in Figure 2. In this figure a white background indicates normal readings, an orange background signifies anomalous activity, while a red background indicates readings taken during and active seizure.
In terms of salmon tracking experiments, the first experiment with a dataset representing a combination of classes A (normal) and C (anomalous), using a binary classification approach, will be labeled as 'experiment 1. ' Meanwhile, the second experiment, which used a dataset representing a combination of classes A (normal), D (anomalous), and E (seizure) and employed a multiclass classification approach, will be referred to as 'experiment 2' .

Classification metrics
For each tested model, the traditional classification metrics -precision, recall, accuracy and F1-score were evaluated.The following formulas apply to those metrics: where TP and TN stand for true positive and negative, respectively, whereas FP and FN stand for false positive and negative.The error rate was represented by the indicator function 1 − accuracy.The Cohen's Kappa coefficient was also reported in the experiments (Warrens, 2015).This coefficient, as described in (McHugh, 2012), assesses the inter-rater reliability and can also serve as an evaluation measurement for the performance of the regarded classification models.In contrast to the overall accuracy of the model, which may be deceptive when dealing with imbalanced datasets, Cohen's Kappa considers the class distribution imbalance to yield more dependable outcomes.This coefficient is calculated according to Eq. 14: where p o denotes the observed values, and p e marks the expected values.

FIGURE 4
Experiment 1 -confusion matrix, objective indicator joint plot, PR and ROC curves of the proposed HASCA method.

Optimization setup
To facilitate the proper application of ML algorithms, proper hyperparameter values should be selected for the algorithm to yield acceptable performance for the problem being tackled.However, the process of selection can be considered an NP-hard challenge given the large number of possible combinations, rendering traditional methods an inadequate approach.This work utilizes mechanistic optimization to select hyperparameters that attained the desired performance outcomes.
As stated above, two sets of experiments were conducted.The first experiment (experiment 1) focused on detecting anomalous activity, presented as a binary classification problem.The second experiment (experiment 2) dealt with determining the type of anomalous activity, structured as a multi-class classification problem.
For both conducted experiments, the datasets were split in a standard way, 70% was allocated to training, 10% for validation, and finally, the remaining 20% was used to test the approaches.Dataset was normalized.The number of lags was set to 15.
Two types of parameters are optimized in this work.Firstly training parameters of the RNN are selected including the learning rate from a range of [0.0001,0.01]and dropout within the range [0.05,0.2]for both experiments.The number of training epochs was selected from the range of [30,60] for the first experiment, and from the range of [50,150] for the second experiment.It is important to note that an early stopping criterion is also used during training equaling 1/3 of the maximum number of training epochs.Secondly, due to the significant influence of network architecture on performance, RNN structures are optimized.The number of network layers is selected from the range of [1,2] for both experiments.Finally, the number of neurons in each layer was selected from the interval [lags/3, lags] for the first experiment, and from the interval [lags/2, lags ⋅ 2] for the second experiment.The values used for the second experiment (training epochs and number of neurons) have been increased due to the increased complexity of the multiclass-classification problem.Parameter ranges have been empirically determined through trial and error and based on previous experience with hyperparameter optimization.
To evaluate the optimization potential of the introduced algorithm, several state-of-the-art metaheuristics have also been tasked with optimizing RNN parameters under identical conditions.The tested algorithms include the original SCA (Mirjalili, 2016) as well as the GA (Mirjalili and Mirjalili, 2019), PSO (Kennedy and Eberhart, 1995), FA (Yang, 2009), BSO (Shi, 2011), RSA (Abualigah et al., 2022), andCOLSHADE (Gurrola-Ramos et al., 2020).Each metaheuristic was issued a total of six individual agents and allowed eight iterations to improve performance.Finally, to facilitate statistical analysis and account for randomness associated with metaheuristic algorithms, testing was carried out through 30 independent runs.
The classification error was used as the objective function for both experiments, since datasets are balanced.Additionally, values of the Cohen Kappa indicator are presented as well.

Experimental outcomes, comparative analysis, validation and interpretation
This section brings forward the detailed simulation outcomes for both executed experiments.Afterwards, the statistical analysis of the experimental outcomes has been conducted to determine if the performance improvements are statistically significant.Finally, the best model's interpretation is provided at the end of this section.In all tables that contain simulation results, the best score in every category is marked bold.

Experiment 1 -Binary classification
The simulation outcomes of the proposed RNN-HASCA method and seven competitor methods for the binary classification problem are summarized in Table 1.Again, it is worth highlighting that the objective function was classification error, and the scores of the Cohen's Kappa coefficient are provided as well.When observing the overall metrics of the objective function across 30 independent executions, shown in Table 1, it is possible to note that several metaheuristics algorithms were capable to reach the same best value.However, due to the stochastic nature of metaheuristics algorithms, important metrics are also the worst and mean results, where the proposed HASCA algorithm exhibited superior performance.Regarding the worst metric, SCA, PSO, BSO and RSA finished second, behind HASCA.Additionally, regarding the mean metrics, PSO attained second place, while RSA finished third.It is also worth noting that RNN-HASCA method obtained the best scores for Cohen's Kappa coefficient as well.
Detailed metrics achieved in the best individual run of every observed method are provided in Table 2.It must be highlighted that all observed algorithms attained respectable results.Finally, Table 3 depicts the best set of RNN hyperparameters obtained by each algorithm.It is noteworthy that all methods determined the networks with one layer.
Aiming to present the obtained results clearer, Figure 3 shows the box plots, violin plots, convergence diagram and swarm diversity plots.It can be observed that the HASCA method exhibits satisfactory converging speed, and also the box plots show that the results are very stable across the independent runs, as other methods have significantly larger deviation.Finally, Figure 4 depicts the confusion matrix, PR and ROC curves, as well as objective indicator joint plot of the suggested HASCA algorithm.

Experiment 2 -Multiclass classification
The simulation outcomes of the proposed RNN-HASCA method and seven competitor methods for the mutliclass classification problem are summarized in Table 4. Similarly to the first experiment, it is worth highlighting that the objective function was classification error, and the scores of the Cohen's Kappa coefficient are provided as well.When observing the overall metrics of the objective function across 30 independent executions, shown in Table 4, it is possible to note that the proposed HASCA algorithm exhibited superior performance, achieving the best scores for best, worst and mean metrics, as well as for standard deviation and variance.SCA attained the second best score, and COLSHADE finished third.Regarding the worst metric, SCA again finished second, behind HASCA.Additionally, regarding the mean metrics, SCA attained second place, while COLSHADE finished third.It is also worth noting that RNN-HASCA method obtained the best scores for Cohen's Kappa coefficient as well.
Detailed metrics achieved in the best individual run of every observed method are provided in Table 5.It must be highlighted that in this scenario (multiclass classification), suggested HASCA obtained superior results, with achieved accuracy of 96.56%.Finally, Table 6 depicts the best set of RNN hyperparameters obtained by each algorithm.It is noteworthy that HASCA again determined the network with one layer.
Aiming to present the obtained results clearer, Figure 5 shows the box plots, violin plots, convergence diagram and swarm diversity plots.It can be observed that the HASCA method exhibits excellent converging speed, and also the box plots show that the results are very stable across the independent runs, as other methods have significantly larger deviation.Finally, Figure 6 depicts the confusion matrix, PR and ROC curves, as well as objective indicator joint plot of the suggested HASCA algorithm.

Statistical validation
To facilitate statistical analysis, the best samples were captured from 30 independent runs of each optimizer.Following this step, the safe use of parametric tests needed to be justified.To accomplish this, several criteria needed to be fulfilled (LaTorre et al., 2021)  In all tables that contain simulation results, the best score in every category is marked bold.
The initial independence condition is fulfilled, as each algorithm is initialized with a different random seed and accordingly a new set of random solutions is generated.To determine if the normality condition is met, the Shapiro-Wilk (Shapiro and Francia, 1972) test for individual problem analysis is used for both binary and multi-class experiments' objective function outcomes.Testing is independently conducted for each algorithm and the determined p-values can be observed in Table 7.
Based on the null hypothesis (H0) that the data originates form a normal distribution, the p-values attained by the Shapiro Wilk shown in Table 7 indicate that the samples for Experiment 1 originate for a near-normal distribution, while samples for Experiment 2 deviate significantly.Nevertheless, as non of the samples meet the normality condition with all p-values below the 0.05 H0 can be rejected.
As the outcomes of the Shapiro-Wilk normality tests indicate that the samples do not fulfill the normality condition, the safe use of parametric tests is not justified.Therefore, the non-parametric Wilcoxon signed-rank test (Wilcoxon, 1992) is utilized.For this analysis, the proposed RNN-HASCA approach is utilized as the control.The outcomes of the Wilcoxon signed-rank test are shown in Table 8.
The outcomes provided in Table 8 indicate that statistical significance criteria are met in most cases except in one.In experiment 1, the introduced metaheuristic does not show a statistically significant improvement when compared to the PSO algorithm.This can be due to a positioning advantage attained by the PSO algorithm during randomized initialization.An additional observation can be made concerning experiment one, where several algorithms attained matching p-values, which can be further confirmed by observing preceding results indicating that these algorithms attain similar objective function outcomes during experimenting.Nevertheless, in all cases except the PSO in experiment 1, the improvements of the introduced algorithm are noticeable and statistically significant.

Best model interpretation
Traditionally, AI algorithms have often been treated as a black box.While with simple models further importance can be determined empirically, as model complexity increases, interpretability becomes more difficult.In recent years more focus has been placed on interpreting model decisions.Interpretation techniques help researchers understand and debug models but also   (Lundberg and Lee, 2017).This approach relies on concepts from game theory to interpret and explain the output of any ML model.In all tables that contain simulation results, the best score in every category is marked bold.

FIGURE 7
SHAP interpretation outcomes for feature impacts.
In this work, SHAP interpretation has been leveraged to interpret the best-performing classification models generated by metaheuristics.These interpretations can prove useful for future research suggesting what segments of the available data can be improved or improved to augment future research.Additionally, the interpretations can prove critical for diagnostic works, reducing the number of electrodes needed to determine outcomes.The interpretation of the best-performing model optimized by metaheuristics is provided in Figure 7.
Shown in Figure 7 are the importance each electrode plays in the classification of the best prediction model, as well as the ranges in which these features impact a decision towards a normal or abnormal outcome.As it can be observed the impacts of all features is reasonable.However, electrodes labeled 95, 63, 91, and 47 play a more significant role in the decision-making process of the model.

Conclusion
The research provided in this manuscript focused on medical EEG dataset classification, through application of the hybrid machine learning and metaheuristics approach.First, a novel hybrid variant of the well-known SCA metaheuristics was introduced, where the deficiencies of the baseline SCA were addressed by adding a chaotic initialization of the population, and incorporating FA search to enhance the exploration.The devised metaheuristics was named HASCA, and it was later employed to tune the hyperparameters of the RNN.
The RNN-HASCA method was evaluated on EEG medical dataset, and the results have been compared to the performance of other contending cutting-edge metaheuristics algorithms.The introduced algorithm attained an accuracy of 99.8769%,The simulation outcomes unambiguously indicate the superiority of the suggested method, that was also validated by applying thorough statistical analysis of the simulation results.The statistical tests have shown that the RNN-HASCA performs statistically significantly better than other regarded methods.Finally, the top-performing model was subjected to SHAP analysis, in order to interpret the results, and better understand the influence of the features on model decisions.The outcome of the SHAP analysis can be leveraged to reduced the number of features needed for detection and thus reduce the computational demands of constructed classification models.
Future work in this domain should focus in two directions.First, the applications of the proposed methodology should be examined further by testing it on other medical data structured as time series, such as electrocardiogram (ECG).The second direction would include examining the proposed HASCA method's capabilities in tuning the hyperparameters of other machine learning models in other application domains, such as intrusion detection, image classification and stock predictions.

FIGURE 1
FIGURE 1Flowchart of the applied methodology.

FIGURE 2
FIGURE 2Visualizations of the first ten electrode readings of the constructed binary (top), multi-class (middle) dataset and dataset distribution (bottom) in each experiment.
authors presented a comprehensive review of the usage of Convolutional Neural Networks (CNN) in the context of medical imaging.The diagnosis and treatment of diseases depend heavily on medical imaging, and CNN-based models have shown considerable gains in image processing and classification tasks.It has been successfully used in liver lesion classification (Frid-Adar et al., 2018) based on computed tomography images and tumor identification (Dhiman et al., 2022).Metaheuristics-driven CNN tuning has proven to be successful in COVID-19 diagnostics as well (Pathan et al., 2021).Machine/deep learning models for epileptic seizure identification utilizing EEG signals have been introduced in a number of research papers.Support vector machines (SVM), k-nearest neighbor (KNN), artificial neural networks (ANN), convolutional neural networks, and recurrent neural networks are a few examples of commonly used algorithms.A hybrid 10.3389/fphys.2023.1267011

Step 2 :
…N, where rand(0, 1) represents the pseudo-random number within [0,1] and LB and UB represent vectors with lower and upper bounds of each solution's component i, respectively.Produce the chaotic population Pop c of N/2 individuals by mapping the solutions that belong to Pop to chaotic sequences by employing Eq. 6-7.Step 3: Merge Pop and Pop c (Pop ∪ Pop c ) and sort merged collection of N individuals according to fitness value in ascending order.
Generate starting populace of N solutions by utilizing chaotic initialization 1. Tune the control parameters and initialize dynamic parameters while exit criteria has not been met do for each solution ido for each component j belonging to solution ido Generate arbitrary value rnd if rnd < smhen Update component j by executing FA search Evaluate the population based on the fitness function value Determine the current best solution P Refresh values of the dynamic parameters end while Return the best-discovered solution Algorithm 2. The HASCA pseudo-code.

3
FIGURE 3 Experiment 1 -box plot, violin plot, swarm diversity diagram and objective convergence diagram.
FIGURE 5 Experiment 2 -box plot, violin plot, swarm diversity diagram and objective convergence diagram.

FIGURE 6
FIGURE 6Experiment 2 -confusion matrix, objective indicator joint plot, PR and ROC curves of the proposed HASCA method.

TABLE 1 Experiment 1 -overall objective and Cohen Kappa metrics for 30 runs.
to categorize seizures with 100% accuracy.As mentioned before, CNN was initially employed for image categorization.
(Hussein et al., 2018)mory (LSTM) network is used by the authors in(Hussein et al., 2018)to demonstrate a deep learningbased technique that automatically recognizes the distinctive EEG features of epileptic episodes.

TABLE 4 Experiment 2 -overall objective and Cohen Kappa metrics for 30 runs.
including independence, normality, and homoscedasticity of the data variance.

TABLE 5 Experiment 2 -detailed metrics of the best run of each algorithm.
In all tables that contain simulation results, the best score in every category is marked bold.

TABLE 7 Shapiro Wilk normality tests.
provide a deeper understanding of the influence of features on model decisions and therefore outcomes.One promising technique that leverages model approximations to determine feature importances is SHapley Additive exPlanations (SHAP)